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We have implemented a variant of the force gradient integrator proposed by Kennedy et.al. and are 
using it in our production 2+1 flavor DWF simulations with pion masses of 180 MeV in (4.5fm)^ 
volumes. We find modest speed-ups 20%) from using the force gradient integrator, compared 
to our previously used Omelyan integrator. On other ensembles, primarily finite temperature 
2+1 flavor DWF QCD, we have extensively tuned the Hasenbusch preconditioning masses and 
achieved speed-ups of 2-3x. Here we have also switched to the force gradient integrator, but this 
change has not had any impact on the speed. We also report on an improved solver for DWF, 
which uses Mobius fermions, with a smaller fifth dimension than the original DWF fermions, as 
an intermediate step in the generation of solutions of the Dirac equation. This approach cuts the 
number of effective Dirac applications by approximately a factor of 2 when the conjugate gradient 
iteration count is large. 
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1. Introduction 

Ensemble generation constitutes a large portion of the computing time in a lattice calculation, 
and in many cases is the most time consuming part. Most lattice calculations use a variant of the 
Hybrid Monte Carlo (HMC) algorithm to generate lattice configurations for later measurements. 
As such, optimizing the HMC integration scheme is important for many lattice calculations. 

We make use of the Hasenbusch mass splitting method and the recently devised force gradient 
integrator to optimize the molecular dynamics evolution in our lattice computations. In particular 
we propose a very efficient way to implement the force gradient integrator. The use of this new 
integrator in combination with the Hasenbusch mass splitting scheme has achieved a factor of 2 
speed up in many of our practical calculations. 

Domain wall fermions (DWF) are widely used for many lattice QCD calculations. To extend 
existing efforts on DWF physics, we present a method that employs the similarity between the 
kernels of Mobius fermion and standard DWF to help accelerate the solution of the DWF Dirac 
equations. 

2. Force Gradient Integrator Implementation 

As shown in [1] the force gradient PQPQP integrator (FGI) can be expressed as 

g 5 J ^Crjr^s- 72 ?{5.{s.r}} <rg ^ rfs (2. 1) 

The key step for this integrator involves the following update on the momentum field 

Pi ^ Pi - l^eiiS) + ^z^ejiS)ejei{S) (2.2) 

ei can be seen as the generalization of the vector fields d/dx in the SU(3) Lie group space 
of link variables. It follows the Leibniz rule and when acting on the gauge field it is effectively 
equivalent to left multiplication by the corresponding generator ei{U) = TiU. 

To implement the second order derivative term {S)ejei{S) in FGI, an approach using the 
Taylor expansion can be used. Since e, and ej are interchangeable in this extra term, the following 
approximation holds 

^xe^iS) - ^T'e^{S)eje^iS) = - ^^'^'^j) ^K^) = ^T^^-^^'^'^'eK^) + ^(t^) (2.3) 

Here = e^{S) and it must be treated as independent of link variables in the above equation(e/ 
acting on F^ is defined to be despite the fact that F-i depends on the gauge field). Note that since 
the FGI has a local error of the above approximation does not change the size of the error. 

Effectively we obtain from the last expression 

e-'>'^^ei{S) = ei{S[U']) (2.4) 

with 

U' = e-i^''^JU = e-i'''^^U (2.5) 
As a consequence we implement the force gradient step using the following two-step approach 
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1. (the force gradient step) Calculate = e-'{S[U]). Then update the gauge field temporarily 
according to U' = e^^^^'^'U . 

2. (the regular step) Calculate again ei{S[U']) using the updated gauge field. The corresponding 
force is added to the momentum field. Upon the completion of this step, restore the gauge 
field to U. 

The above implementation of the force gradient integrator is different from [1], since the ap- 
proximation in eq.(2.3), changes the i^(t^) error term. The resulting integrator is still symmetric 
and symplectic. 

There is an additional benefit from using this implementation. The two steps described above 
are exactly the same except using different gauge fields U and U', and U' differs from U by only 
a small factor (~ T^/24). If a fermion action is involved, the solution from the first step can be 
used as an initial guess for the second step when solving the Dirac equations for the force terms. 
This "force gradient forecasting" reduces costs significantly. Unlike the chronological solver, it is 
independent of the direction of the molecular dynamics time and does not impact reversibility. 

When testing the force gradient integrator on a 16^ x 32 x 16 lattice with 2+1 flavor dynamical 
DWF fermions (420MeV pion) using quotient and rational quotient actions, we were able to raise 
the top level step size from 1/4 as in the Omelyan to 1/3 for a unit trajectory length. However, due 
to the extra cost in the force gradient solve, the new integrator does not provide a significant speed 
up over a finely tuned Omelyan integrator in the case we tested. For larger 32^ volumes, we have 
seen a modest benefit from the FGI and will test on 48^ volumes soon, where the benefit may be 
larger. 



3. Hasenbusch Mass Splitting 

The Hasenbusch factorization of the DWF fermion determinant entering in our evolutions 

\D^l)D{l) J \D'^{m + pi)D{m + pi)) \ D\\)D{\) J ^ ' 

has been proposed in[2] to separate the contribution to the fermion force from the light eigenmodes 
and heavier ones. The inversion of D\m)D{m) is more expensive than the introduced operator 
D^(m + /x)D(m + n). Urbach et al.[3] proposed that by assigning a smaller force to the more 
expensive part (via tuning jj.) and putting it on a coai^ser time scale by means of a multiple level 
integrator, an acceleration of the molecular dynamics evolution can be achieved. We have used 
this approach for a number of yeai^s, but there are overheads associated with multiple time scale 
integrators. 

Here we propose an approach that uses the idea of mass splitting, but without different time 
scales. Instead of putting the part with larger force on a finer time scale, we split the force further 
via introducing additional intermediate masses. Namely we introduce 

\Dt{l)D{l)J l\ V Dt(m + At;)£»("i + At,-) J 
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with = /^o < Ml < • • • < M/t+i = 1- One advantage of this scheme over [3] is that all interme- 
diate masses = 1,2, •• • ,k) can be tuned continuously, while any multiple time-step integrator 
scheme only admits a 1 : « ratio on the number of force term evaluations, with integer n being the 
number of evaluations of the nested integrator per evaluation of the upper level integrator. 

The observed benefit of this scheme is that all Dirac equations related to the force terms of the 
quotient actions need not be solved very accurately, if enough intermediate masses are introduced 
and tuned. We were generally able to solve the force term Dirac equations only up to a residual of 
10^^ instead of the previously used 10^^, thus saving a considerable amount of time. 



4. Mobius Accelerated Domain Wall Fermion (MADWF) 

We start by considering the general Dirac equation, 

Di)wi>n)x = b, 



(4.1) 



with Ddw being the standai^d domain wall fermion Dirac operator. The goal is to use Mobius 
fermions to construct an approximate equation. Before solving equation (4.1), we solve the ap- 
proximated equation and use the solution as an initial guess. 

The primary tool to relate the standard domain wall fermion and the Mobius fermion is the 
domain wall - overlap transformation [5, 6, 7]. It relates the Mobius Dirac operator to an equivalent 
4D overlap operator. By applying /'^^D^^(l) to both sides of equation (4.1)', we can transform 
the 5D Dirac matrix into a diagonal form in the s direction. 



(4.2) 
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with y = (3'i,3'2)3'3) • ■ ■ jJl)^ = ^ ^ is the size of the s direction and 

C = (ci,C2,C3,-- - ^ctf =P^'^DDw{'^Y^b. 

The details, including the definition of symbols ^2, • • • ,Sl and d, can be found in [5]. 
After the transformation the primary work is to solve the 4D equation 

Dov{m)y\ =c\ 



(4.3) 



(4.4) 



with an overlap-like operator Dgv (m) . The rest of the work involves deriving the whole 5D solution 
fromyi, which can be computed with little effort, as will be seen below. 

We first address the question of solving the 4D equation (4.4). We notice that Dov{m) is an 
approximation to the ideal overlap operator. In parallel we can also obtain a different finite approx- 
imation Dpy^m) from any 5D Mobius operator D'JJ^,{m). For our purpose we replace Dov{m) in 



'P is a simple matrix that shifts half of the spin component in the fermion vector by one in s direction, its detailed 
form can be found in [5]. 
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equation (4.4), which comes from a standard domain wall fermion operator, by D'^y (m) that comes 
from a Mobius fermion operator. The solution y\ to the following equation 

D'ovim)y\=ci (4.5) 

can be directly used to approximate yi, and thus reconstruct a 5D approximated solution to (4.1). 

To derive the entire 5D solution y from ji , we notice that for a true solution y = {yi,y2,y3i - ■ ■ , Jl)^ 
we have 

yk = Ck-SkSk+\---SL{Dov -'^)dyu k = 2,3,--- ,L. (4.6) 

This relation can also be used to construct the guessed solution once y[, the approximation to y\, is 
obtained. Namely, 

y',^ = Ck-SkSk+i---SLiDov-'i^)dy'i, k = 2,3,--- ,L, (4.7) 
where represents the approximation to y^. This can be written in matrix form 

{-Dovy[,y2,y3r-- ,3'!)'^ =^"'£>w(1)"'Odw("i)^- {-y\,C2,C3,--- (4.8) 

Reconstructing the entire 5D guess in this way requires a Pauli-Villars DWF solve. 

There remains the question of solving the 4D Mobius Dirac equation (4.5). To do this we 
reverse the above process. The detailed procedure can be found at [5, 6, 7]. By using the domain 
wall - overlap transformation on the 4D equation (4.5) we can transform it back to the 5D form 

D'j,^{m)Py'=D'o„{l)Pc' (4.9) 

with c' = (ci , 0, 0, • • • ,0) and y\ being the 4D component of / on the 5 = 1 hyperplane. 
The algorithm then looks like 

1. Construct ci from the original 5D source b, using equation (4.3). 

2. Replace the 4D operator in equation (4.4) by a 4D operator from a Mobius fermion operator 
with appropriately chosen parameters. 

3. Transform the 4D Mobius Dirac equation back to its 5D form (4.9) and solve the 5D equation, 
thus obtaining the 4D solution y\ to equation (4.5). 

4. Use y\ as an approximation to the 4D domain wall Dirac equation (4.4), and reconstruct the 
5D (approximated) solution using (4.8). 

The above process requires two Pauli-Villars solves in standard DWF ((4.3) and (4.8)) and one 
Mobius solve on a lattice with possibly smaller size in s direction (4.9). The gain over a direct 
solve comes from the fact that the Mobius equation has a potentially smaller s dimension size. It's 
clear that the approximated solution y\ must be sufficiently close to the true solution y\ to make 
this method beneficial. This leads to the problem of choosing the best parameters. The details of 
parameter tuning will be given later, in a separate paper. 
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Figure 1: DWF CG count/Mobius CG count correlation, 160MeV, 32^ x 8, with L = 32, L' ^ 16, b = 1.5, 
c = 0.5. A turning point is clearly presented in the curve, representing the best approximation can be obtained 
from solving the Mobius Dirac equation. 



5. Defect Correction and Production Use 

The Mobius approximation to the standard DWF Dirac equation can only achieve a certain 
accuracy (Figure (1)), as the two operators are not exactly the same. A simple defect-coiTcction 
method can be used to achieve a higher accuracy in constructing the guessed solution. In this 
approach, one takes the guessed solution and calculates the residue, then constructs a second guess 
with the residue as the new source vector using the Mobius approach. The second guess can be 
simply added to the original guess to improve it. This method can be repeated if necessary. 





Direct CG solve 


Mobius Accelerated DWF 


Operation 




Mobius equation(*3) 


Pauh-Villars(*6) 


others 


total 


Op. Count 


11290*32=3.6e5 


4.6e3*12 


1.0e2*32 




2.0e5 


time(s) 


2672 


285 


25 


125 


1138 



Table 1: Cost comparison of MADWF CG solver with a regular (zero started) CG solver. L — 32, L' = 12. 
Data obtained from a 512 node partition onBG/P, both solve to 10^'". Note that there are 3 Defect-correction 
steps, each includes 2 Pauli-Villars unit mass inversions. The factor 32 and 12 in the "Op. Count" row are 
due to different 5th dimension sizes. 



Table (1) compares this method and the direct solve approach on a 32^ x 64, L = 32 lattice. 
The corresponding Mobius approximation has 5th dimension size L' = 12. Since multiple 5th 
dimension sizes are used we count the number of 4D Wilson Dirac operator applications (denoted 
by "Op. count" in the table). The table shows that MADWF reduces the number of 4D Wilson 
Dirac operator applications from 3.6e5 to 2.0e5, by a factor of 1.8. The MADWF solver uses 
42% of wall clock time, when compared with a direct solve. Note that while some gain is due 
to code optimization, most is due to the fact that MADWF requires many fewer 4D Wilson Dirac 
operations. 

We also note that the Mobius parameters need only be tuned once for a specific ensemble. 
Changing configuration or source vector has little effect on the quality of the Mobius approxima- 
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tion. Once tuned, the same set of Mobius parameters can be used to approximate the DWF Dirac 
equation for the entire ensemble. 

6. Conclusion 

We have presented a method to implement the force gradient integrator. By testing the force 
gradient integrator on an existing lattice we have shown that for small lattices the extra cost is high 
enough so any benefit is small. However, due to its better scaling behavior it's possible that the 
force gradient integrator performs better than other second order integrators on larger lattices or 
with longer trajectories. 

We have explored the Hasenbusch mass spUtting scheme and applied this method to our prac- 
tical calculation. We have found that extra quotient actions introduced in this way allows the Dirac 
equations associated with the force term evaluations to be solved less accurately, reducing the cost 
of HMC considerably. 

We have shown that the similarity between the 4D Mobius fermion Dirac operator and the 
Shamir domain wall fermion operator can be used to accelerate solving the DWF Dirac equation. 
We have achieved a factor of 2 speed up on a specific large lattices (32^ x 64, = 32) that we are 
using in practical calculations. 

7. Acknowledgment 

We thank A. Kennedy and M. Clark for notes regarding the force gradient integrator and a 
prototype of a force gradient C++ implementation. We also thank our colleagues in the RBC 
and UKQCD collaborations for discussions and tests on the program. This work is done using 
the QCDOC supercomputers of the RBRC group, NYBlue at BNL and the BG/P of the ALCF at 
Argonne. We use the MDWF package for simulating the Mobius fermions. 

References 

[1] A. D. Kennedy, M. A. Clark and P. J. Silva, Force Gradient Integrators, PoS LAT2009:021,2009 

[2] M. Hasenbusch, Speeding up the hybrid Monte Carlo algorithm for dynamical fermions, Physics 
Letters B 519 (2001) 177 - 182. 

[3] C. Urbach, K. Jansen, A. Shindler, U. Wenger, HMC algorithm with multiple time scale integration 
and mass preconditioning, Comput. Phys. Commun., 174(2006) 87 - 98. 

[4] R.C. Brower, H. Neff and K. Orginos, Mobius Fermions: Improved Domain Wall Chiral Fermions, 
Nucl. Phys. B(Proc. Suppl.) 140(2005) 686-688. 

[5] R.C. Brower, H. Neff and K. Orginos, Mobius Fermions, Nucl. Phys. B(Proc. Suppl.) 153(2006) 
191-198. 

[6] A. Borigi, Computational methods for the fermion determinant and the link between overlap and 
domain wall fermions, arXiv:hep-lat/0402035vl 

[7] A. Allkofi and A. Bori9i, Reducing the beta-shift in domain wall fermion simulations, PoS LAT2005 
(2005) 099 



7 



